Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

2008 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2008  to  00-00-2008 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

AI  and  Mental  Imagery 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

United  States  Military  Academy ,EECS, West  Point, NY, 10996 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

1 

1 

16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 

unclassified 


b.  ABSTRACT 

unclassified 


c.  THIS  PAGE 

unclassified 


17.  LIMITATION  OF 
ABSTRACT 

Same  as 
Report  (SAR) 


18.  NUMBER 
OF  PAGES 

2 


19a.  NAME  OF 
RESPONSIBLE  PERSON 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


AI  and  Mental  Imagery 


Samuel  Winter  mute 

University  of  Michigan 
2260  Hayward  St. 

Ann  Arbor,  MI  48109-2121 
swinterm@umich.edu 


Introduction 

Vision  and  space  are  prominent  modalities  in  our 
experiences  as  humans.  We  live  in  a  richly  visual  world, 
and  are  constantly  and  acutely  aware  of  our  position  in 
space  and  our  surroundings.  In  contrast  to  this  seemingly 
precise  awareness,  we  are  also  able  to  reason  abstractly, 
use  language,  and  construct  arbitrary  hypothetical 
scenarios. 

In  this  position  paper,  we  present  an  AI  system  we  are 
building  to  work  towards  human  capability  in  visuospatial 
processing.  We  use  mental  imagery  processing  as  our 
psychological  basis  and  integrate  it  with  symbolic 
processing.  To  design  this  system,  we  are  considering 
constraints  from  the  natural  world  (as  described  by 
psychology  and  neuroscience),  and  those  uncovered  by  AI 
research.  In  doing  so,  we  hope  to  address  the  gap  between 
abstract  reasoning  and  detailed  perception. 

Constraints  from  AI  and  Psychology 

Historically,  one  of  the  most  prominent  approaches  to  AI 
has  been  symbol  processing.  While  purely  symbolic  AI  has 
its  weaknesses,  it  has  some  very  important  strengths. 
Symbols  allow  for  very  general  reasoning  and  can  be 
composed  together  to  create  arbitrary  hypothetical 
situations  (Newell,  1990).  Humans  also  exhibit  the  ability 
to  create  arbitrary  situations  and  since  symbols  are  a  good 
AI  answer  for  this  capability,  we  take  this  as  a  constraint 
on  our  system:  it  must  use  symbolic  reasoning. 
Specifically,  we  are  pursuing  our  research  in  the  context  of 
the  Soar  cognitive  architecture  (Laird,  2008),  which 
includes  symbolic  processing. 

Symbolic  AI  systems  typically  use  qualitative  reasoning, 
where  a  higher-level  representation  of  the  continuous 
world  is  reasoned  over,  rather  than  precise  information  as 
might  be  provided  by  the  senses.  Much  work  in  AI  has 
focused  on  finding  appropriate  qualitative  representations 
of  space,  but  this  work  has  lead  to  the  poverty  conjecture 
of  Forbus  et  al.  (1991),  that  “there  is  no  purely  qualitative, 
general-purpose  representation  of  spatial  properties”.  If 
this  is  true,  it  places  another  constraint  on  our  system:  it 
must  employ  a  non-qualitative  representation  of  space. 

Looking  to  psychology,  a  relevant  area  of  study  is 
mental  imagery  (Kosslyn,  2006).  We  have  been 
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investigating  two  forms  of  imagery,  spatial  and  visual.  In 
both  forms  it  appears  the  brain  activates  its  perception 
system  from  the  top  down,  imagining  objects  in  the  same 
systems  that  we  normally  associate  with  perception. 
Humans  seem  to  have  specialized  systems  to  handle  spatial 
and  visual  information,  and  imagery  brings  these  systems 
under  the  umbrella  of  cognition,  since  they  are  used  for 
more  than  simply  translating  sensory  information  into 
higher-level  representations.  This  is  another  constraint  in 
our  system:  it  must  model  human  spatial  and  visual 
imagery  by  including  representations  and  specialized  forms 
of  processing  associated  with  these  forms  of  imagery. 

The  SVS  Spatial  /  Visual  System 

Soar+SVI  (Soar  Spatial  and  Visual  Imagery;  Lathrop  and 
Laird,  2007;  Lathrop,  2008)  is  a  system  created  to  study 
spatial  and  visual  imagery  with  Soar,  and  SRS  (Spatial 
Reasoning  for  Soar;  Wintermute  and  Laird,  2007,  2008)  is 
a  system  created  to  explore  problem  solving  with  a  spatial 
representation,  focusing  on  the  translation  between  the 
spatial  and  symbolic  layers.  These  systems  are  being 
combined  and  improved  in  a  new  system.  Spatial  and 
Visual  System  or  SVS  (Figure  1),  an  extension  to  Soar. 

There  are  two  short-term  and  one  long-term  memory  in 
SVS.  The  two  STMs  are  for  visual  information  (roughly 
corresponding  to  the  visual  cortex),  and  spatial  information 
(roughly  corresponding  to  a  region  in  the  parietal  cortex). 
The  Visual  Buffer  is  retinotopically  mapped,  and  two- 
dimensional.  It  represents  strictly  visual  information,  such 
as  color  and  exact  shape.  The  retinotopic  brain  structure 
corresponds  to  a  depictive  structure  in  the  computer  (a 
bitmap  image),  where  empty  space  is  represented 


explicitly.  The  Spatial  Scene  is  three-dimensional,  and 
could  be  inferred  from  any  modality,  but  we  assume  that  it 
is  inferred  from  the  Visual  Buffer  during  perception.  In 
virtual  environments,  it  is  possible  to  connect  the  Spatial 
Scene  directly  to  the  environment,  as  those  environments 
typically  use  spatial  encodings  natively.  SVS  also  includes 
a  perceptual  LTM,  for  both  spatial  and  visual  information. 
All  of  these  memories  communicate  with  Soar  via  a 
qualitative  interface:  no  low-level  spatial  or  visual 
information  is  present  there,  only  high-level  information 
such  as  object  identities  and  relationships  between  objects. 

The  previous  systems  are  subsets  of  SVS,  and  we  have 
reported  several  results  in  studying  them.  The  constrained, 
representation-specific  processing  in  the  spatial  and  visual 
systems  provides  a  functional  advantage  and  is  more 
efficient  than  processing  the  same  information 
symbolically  (Lathrop,  2008).  The  use  of  imagery  can 
allow  complicated  reasoning  processes  such  as  path 
planning  to  be  split  between  abstract  symbolic  reasoning 
processes  and  precise  spatial  processes  (Wintermute  and 
Laird,  2007),  making  an  overall  system  that  is  both  general 
across  problems  and  precise  within  problems.  Spatial 
imagery  also  allows  symbolic  AI  to  address  problems 
involving  fine-grained  continuous  motion  (Wintermute  and 
Laird,  2008).  A  common  theme  in  all  of  this  work  is  that  a 
system  with  imagery  is  able  to  symbolically  compose 
hypothetical  scenarios,  which  can  then  be  precisely 
interpreted  by  using  imagery  (e.g.,  “What  if  I  tried  to  move 
around  this  obstacle?”  “If  the  enemy  was  sitting  by  the  hill, 
could  my  teammate  see  him?”). 

For  example,  an  agent  operating  with  a  teammate  in  an 
environment  with  obstacles  and  adversaries  may  use 
imagery  to  determine  if  its  teammate  is  in  a  good  position 
to  over  watch  an  approaching  enemy  (Lathrop,  2008).  To 
do  this.  Soar  formulates  a  symbolic  description  of  the 
hypothesized  position  of  the  enemy  and  teammate,  which 
is  then  interpreted  by  the  imagery  system.  Soar  can  then 
query  the  imagery  system  for  qualitative  implications  of 
the  situation,  such  as  “Does  the  region  viewed  by  my 
teammate  intersect  the  enemy?”. 

Future  Work 

An  advantage  of  examining  visuospatial  processing  from 
an  imagery  standpoint  is  that  we  can  make  progress 
without  having  to  address  every  problem  typically  found  in 
vision  research.  It  is  much  easier  to  derive  a  visual 
representation  of  a  known  object  in  a  known  position  than 
it  is  to  identify  an  unknown  object  and  infer  its  position.  As 
the  processes  and  representations  used  in  mental  imagery 
are  shared  with  perception,  studying  imagery  in  AI  should 
aid  the  study  of  computer  vision.  In  particular,  creating  an 
imagery  system  requires  us  to  determine  what  is  and  is  not 
a  sufficient  system  for  representing  and  using  visuospatial 
knowledge.  We  hope  that  this  will  further  constrain  the 
vision  problem,  aiding  research  in  that  area.  Similarly,  as 
humans  are  able  to  solve  the  same  problems  our  system  is 
addressing,  exploring  the  details  of  what  architecture  is 


needed  to  perform  this  type  of  non-logical  reasoning  can 
aid  the  further  development  of  psychological  theories. 

To  further  these  goals,  we  are  working  on  extending  our 
system  towards  robotics.  We  believe  that  mental  imagery 
can  provide  a  key  link  in  robotics  systems  attempting  to 
incorporate  a  full  range  of  capability,  from  the 
sensor/effecter  level  to  the  cognitive  level.  Pursuing  this 
presents  many  scientific  and  engineering  challenges.  Most 
importantly,  methods  are  needed  to  translate  common 
robotic  sensory  information  into  spatial  objects  or 
references  to  prototypical  objects  in  LTM,  which  can  then 
be  retrieved  and  used  in  reasoning. 

In  addition  to  developing  SVS  as  an  AI  system,  we  have 
long-term  plans  to  extend  SVS  to  model  the  details  of 
perceptual  attention.  This  capability  should  allow  SVS  to 
serve  more  directly  as  a  psychological  model,  since  its 
results  could  be  matched  against  human  data.  In  addition, 
this  kind  of  modeling  should  force  the  system  to 
encompass  a  theory  of  the  timing  of  object  recognition, 
which  will  move  it  closer  to  addressing  the  mechanisms  of 
object  recognition. 

Conclusion 

We  have  been  working  to  integrate  a  naturally-inspired 
component,  mental  imagery,  with  an  existing  AI  system. 
Soar.  This  integration  has  increased  the  capabilities  of  the 
AI  system,  and  has  opened  up  interesting  research 
directions  in  both  AI  and  psychological  modeling. 
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